Association Coefficient Measures for Document Clustering
نویسندگان
چکیده
This paper presents Association Coefficient Measures for Document Clustering. The proposed Association Coefficient Measures approach is based on Intuitionistic Fuzzy Sets. In this paper twelve Association Coefficient Measures from f1 to f12 are used. In Document Clustering Document collection, Text Pre-processing, Feature Selection, Indexing, Clustering Process and Results Analysis steps are used. Twenty News group data sets [17] are used in the Experiments. For experimental results analysis evaluated using the Analytical SAS 9.0 Software is used. The Experimental Results show the proposed approach out performs. Keywords— Intuitionistic Fuzzy Sets, Association Coefficient measure,Clustering.
منابع مشابه
Comprehensive Survey on Distance / Similarity Measures between Probability Density Functions
Distance or similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that are applicable to compare two probability density functions, pdf in short, are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering ...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملNESM: a Named Entity based Proximity Measure for Multilingual News Clustering
Measuring the similarity between documents is an essential task in Document Clustering. This paper presents a new metric that is based on the number and the category of the Named Entities shared between news documents. Three different feature-weighting functions and two standard similarity measures were used to evaluate the quality of the proposed measure in multilingual news clustering. The re...
متن کاملComparative Study of k-means and k-Means++ Clustering Algorithms on Crime Domain
This study presents the results of an experimental study of two document clustering techniques which are kmeans and k-means++. In particular, we compare the two main approaches in crime document clustering. The drawback of k-means is that the user needs to define the centroid point. This becomes more critical when dealing with document clustering because each center point represented by a word ...
متن کاملArabic text summarization based on latent semantic analysis to enhance arabic documents clustering
Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (IR) systems especially with the rapid growth of the number of online documents present in Arabic language. Documents clustering aim to automatically group similar documents in one cluster using different similarity/distance measures. This task is often affected by the document...
متن کامل